Optimal estimation of a physical observable's expectation value for pure states 
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We study the optimal way to estimate the quantum expectation value of a physical observable 
when a finite number of copies of a quantum pure state are presented. The optimal estimation is 
determined by minimizing the squared error averaged over all pure states distributed in a unitary 
invariant way. We find that the optimal estimation is "biased" though the optimal measurement 
is given by successive projective measurements of the observable. The optimal estimate is not the 
sample average of observed data, but the arithmetic average of observed and "default nonobserved" 
data, with the latter consisting of all eigenvalues of the observable. 



I. INTRODUCTION 

Onc of the fundamental tasks in quantum physics is to 
determine the expectation value of a physical observable 
of an unknown quantum state. With only a single copy 
of the quantum state given, we cannot determine the ex- 
pectation value of a physical observable because of the 
statistical nature of quantum measurement. Suppose we 
are presented with a certain finite number N of copies 
of an unknown quantum state. We cannot increase the 
number of the copies since the no-cloning theorem [| 
forbids it. Then what is the optimal way to determine 
the expectation value of the observable for a given JV? 
An intuitively plausible optimal estimate is given by the 
arithmetic average of the data produced by successive 
projective measurements of the observable on the indi- 
vidual systems. 

This problem, however, is by no means trivial. Given 
a quantum system composed of subsystems, we can con- 
sider two types of measurement. One is separate mea- 
surements: a sequence of measurements on the individ- 
ual subsystems, possibly dependent on the outcomes of 
earlier measurements. The other is joint measurement: 
a single measurement on the system as a whole. Recent 
studies on quantum-state discrimination and estimation 
d provide considerable instances in which joint mea- 
surements perform better than separate measurements 
even for a state composed of mutually uncorrelated sub- 
systems. 

Peres and Wootters showed that a certain set of three 
bipartite product states can be better distinguished by 
a joint measurement 00] (see also 0,0 13)- An even 
stronger example was provided by Bennett et al. 0, 
which shows that a certain orthogonal set of bipartite 
product states cannot be reliably distinguished by any 
separate measurement though a joint measurement per- 
fectly distinguishes them because of their mutual orthog- 
onality. The superiority of joint measurement has also 
been discussed in the problem of quantum-state estima- 
tion for identically prepared copies of an unknown state 
(see ES EU E El El El, for example). 

In [ï^. D'Ariano, Giovannetti, and Perinotti raised the 
question of whether the Standard procedure of averaging 
the outcomes of repeated measurements of an observable 



over equally prepared systems is the best way of estimat- 
ing the expectation value of the observable, or whether 
a joint measurement can improve the estimation. They 
showed that the Standard procedure is indeed optimal if 
one is restricted to the class of unbiased estimation for 
any generally mixed state. Here an estimator is said to be 
unbiased if the average over many independent estimates 
gives the true value to be estimated. 

An unbiased result is certainly one of the desirable 
properties for estimation but not a necessary condition. 
A natural question is then whether a "biased" estimation 
performs better than the Standard unbiased estimation. 
Let us take a simple example, in which we estimate the 
expectation value of the observable a z for a single-qubit 
system in an unknown pure state. We assume that the 
state of the qubit is chosen according to the uniform dis- 
tribution on the Bloch sphere. Suppose that the projec- 
tive measurement of a z produced the outeome 1, which 
means the sample average is 1. Now, one can ask if it is 
reasonable to conclude that the expectation value of o z 
is most likely equal to 1. Note that the expectation value 
of a z is 1 only if the qubit lies exactly at the north pole of 
the Bloch sphere. On the other hand, the measurement 
of a z can produce the outeome 1 with some probability 
unless the qubit is exactly at the south pole. Therefore, it 
is more reasonable to consider that the expectation value 
of a z is not 1, but somewhere between and 1. In fact, 
the optimal estimate turns out to be 1 /3 in this case, as 
we will see in the next section. 

In this paper, without assuming unbiasedness of the 
estimation, we study the optimal procedure for the ex- 
pectation value of a physical observable of an unknown 
pure state, when N copies of the state are presented. We 
assume that the unknown pure state is chosen from the 
pure-state space according to a unitary invariant a priori 
distribution. The optimal estimation is determined by 
minimizing the squared error averaged over the a priori 
distribution. 



II. OPTIMAL ESTIMATION 

We determine the optimal way to estimate the expec- 
tation value of a physical observable íl when N copies of 
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an unknown pure state p — \ (f> ) ( <j> | on a (i-dimensional 
Hilbert space H are given. Let {E a } be a positive- 
operator-valued measure (POVM) on the total system 
Tl® N , with outcome labeled a providing an estimate ui a 
for the expectation value given by tr[pf2]. For a given p, 
the mean squared error in the estimate is written as 



A(p) =Y,to[Eap (S * r ] («« - tr[pÜ]f 



(D 



We will firat average this A(p) over all pure states p and 
then minimize it with respect to the POVM {E a } and 
the estimate {oJ a }- 

The distribution of the pure states p is specified in the 
following way. Expand a pure state as | cj>) — J2i=i °i \ * ) 
in terms of an orthonormal base {\i)} of TL. The dis- 
tribution is then defined to be the one in which the 2d- 
component real vector {xí = Heci,yi = Imc,} is uni- 
formly distributed on the (2d — l)-dimensional hyper- 
sphere of radius 1 . The distribution is unitary invariant 
in the sense that it is independent of the orthonormal 
base {| i )} chosen to define it. Let us denote the average 
over this distribution by (• • •}. All we need in the follow- 
ing calculation is a useful relation for the average of p®" 
given in Ref. 0], that is, 



(2) 



where S n is the projection operator onto the totally sym- 
metric subspace of TL® n and d n is its dimension given by 
d n = tr[<S n ] = n+d-iCd-i- It may be instructive to see 
how this formula comes out in some simple cases of qubits 
(d = 2), in which the above distribution means that the 
Bloch vector n is uniformly distributed on the surface of 
the Bloch sphcre. Then we can easily verify 
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Kl) + <r(2)) 2 

1/3 (5=1), 
(5 = 0), 



(3) 



where 5 is the eigenvalue of the total spin. This is a spe- 
cial case of the formula J2J, since the state is symmetric 
if S = 1 and antisymmetric if 5 = 0. The case of three 
qubits provides another examplc. 



1 f, fl + n-a- 
— I dn 

4ir 



®3 



_ / 1/4 (S = 3/2), 
(5 = 1/2). 



(4) 



Now going back to the general-dimensional case, we 
expand Eq. Q and perform averaging over p by use of 
the formula 

(A) = ^(tr^P^] (üj 2 a -2Lü a tr[píl} + (tv[ P n}) 2 )) 

a 

= (A 1 (p) + A 2 (p) + A 3 (p)), (5) 

where we denote the three terms in A(p) by Ai(p), 
A 2 (p), and As(p), and we evaluate each separately. The 



firat term (Ai) is readily calculated as 

(Ar) = ^^^tr[í; o 5iv]. (6) 

dN a 

For (A3), we firat use the completeness of the POVM by 
summing over a and perform the average in the following 
way: 

(A 3 ) = <(tr[pí7]) 2 ) = (tr[p® 2 íí(l)í7(2)]) 

= ltr[S 2 fi(l)Q(2)] = — L·- ((trfi) 2 + trfi 2 ) , 



(7) 



where p® 2 is understood to be the tensor product of two 
p's in spaces 1 and 2, and the space on which the oper- 
ator íl acts is specified by the number in the parenthe- 
ses. Hereafter we will use this convention in more general 
cases, namely, 



ü(n) = l®^" 1 ) ® ü <g) 1 ® 1 (g.. 



(8) 



Evaluation of the second term (A2) is more involved. 
Introducing another system on Tí, which we call system 
N + 1 , we have 

<A 2 ) = -2^c a (tr[S a ^ JV ]tr[pr!]) 

a 
a 

5^w tr[^ 5^ + in(JV+l)], (9) 



w+i 



where the traces in the second and third equations are 
understood to be over systems 1,2,. . . ,N, and N+l. The 
operator Í2(JV+ 1) acts on system N + l. The projection 
operator Sn+i is the sum of all permutation operators of 
V + 1 systems divided by a factor of (N + 1)!. Any per- 
mutation of N + 1 objeets is either just a permutation 
among the firat N objeets or the product of a permu- 
tation among the firat N objeets and the transposition 
between the (N + l)th object and one of the firat N ob- 
jeets. With this observation we find, for any operator 
0, 



tr N+1 [S N+1 íl(N + 1)} 



(tríl + Y íl(n)\ , 



(10) 



where trjy+i is the trace over the (N + l)st system. We 
use this formula to trace out the newly introduced system 
(N+ 1) in the expression A2 given by Eq. ©. The result 
is given by 



A2 = -^J2^ 



tr 



(11) 
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where we define the symmetric one-body operator fi to 
be 



fi 



1 



JV 



N + d 



trfi + ^2 ^ 



(12) 



Combining the three averages (Ai), (A2), and (A3), we 
obtain 



(A) = 



d N 4^ L 



E a S N Lü a - 2uj a ü 



d(d+l) 



(trfi) 2 + trfi' 



(13) 



To minimize (A) we complete the square with respect to 
Lu a in this expression. Owing to the completeness of the 

POVM, this is reduced to the calculation of tr Sn& 2 , 

which can be performed by using the following formulas: 

(14) 



tr\S N n(n)} = ^trfi, 
d 



ti[S N ü(n)ü{m)} 

^tr[fi 2 ] (n = m), 

(tr[fi 2 ] + (trfi) 2 

After some calculation we find 

d N 



(n m). 



(15) 



tr 



iS^vfi 2 



d(d+í)(N + d) 
(ATtr[fi 2 l + {N + d- 



l)(trfi) 2 ) . (16) 



We thus finally obtain the mean squared error in the 
completed square forní 



(A) = -Lj> 

d N ^ 



d(d+l){N + d) 



E a SN (u a — fi) 

dtr[fi 2 ] - (trfi) 2 ) . (17) 



Now note that the first term in Eq. (|17fl is posi- 
tive. This is because fi is symmetric under exchange 
of component subsystems and therefore 5/v(w a — fi) 2 = 
SnÍ^u — ^1) 2 Sn is a positive operator. The A has a 
lower bound given by the second term of Eq. (|17|) . Let 
us denote the eigenvalue of fi by fi^ (i = 1, . . . , d) and 
the corresponding eigenstate by | i ) . It is then readily 
seen that this lower bound can be achieved if the index 
a of the POVM clement collcctively represents the set 
of {ii, Í2, ■ ■ ■ ,ín}, the POVM element is taken to be the 
projector 
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IX ,12, -,•» 



I hh ■ ■ ■ ín){íiÍ2 ■■■Ín\, 



(18) 



and the estimate uj a to be the corresponding eigenvalue 
of fi, 



1 



U); 



N + d 



trfi 



E n « 



(19) 



Thus we conclude that the mean squared error (A) in 
the estimation for the expectation value of the observable 
fi takes its minimum value given by 

1 



A opt 



d(d+l)(N + d) 



(dtr[fi 2 ] - (trfi) : 



(20) 



if one measures the observable fi independently for each 
system and makes the estimate given by 



1 



■>opt 



trfi 



N 

E 

n=l 



fi, 



(21) 



N + d 

. , fi j„ } are the data observed by the 



where {fi n ,fi Í2 , 
measurement. 

The optimal estimate w op t is not the arithmetic aver- 
age of observed data (the sample average), though the 
optimal measurement is projective and independent. For 
a finite N, it is not unbiascd eithcr sincc 



£ Wo tr[E oP ®"] 

a 



= tr 



N + d 



(trfi + Vtr[pfi]) , (22) 



which only asymptotically approaches tr[pfi]. In Sec. IV 
we will discuss the biasedness of w opt and present an in- 
terpretation of its structure. 

What do we obtain for the mean squared error if we 
take the sample average of the vàlues of fi observed by 
the successive measurements on each copy? In this case 
the POVM is given by Eq. (|TH|) and the estimate by 



1 N 

-Y fi, 

N ^ 

n=l 



(23) 



which can be easily shown to be unbiased. The squared 
error for a given p given in Eq. takes the form 



A av (p) 



1 

ÏV 



tr[pfi 2 ] - (tr[pfi] 



(24) 



After the average over p we have 

A av = (A av (p)) = d{d l l)N (dtrfi 2 - (trfi) 2 ) , (25) 

where we used Eq. 0. 

Comparing A opt and A av , we find that the only differ- 
ence between them is in the factor in the denominators, 
N + d in A opt and in A av . While A opt is certainly 
less than A av , both show the same asymptotics when the 
number of copies goes to infinity. The difference becomes 
important when the number of copies is comparable to 
the dimension of the system. 

Let us examine the example discussed in Sec. I, in 
which d z is measured with the result 1 for a single qubit 
in an unknown pure state (d = 2 and N — 1). In this 
case the observed data is {1}. The estimate by the sam- 
ple average gives cj av = 1 for the expectation value of 
a z with the mean squared error A av = 2/3, whereas the 
optimal estimation predicts w op t = 1/3 with the mean 
squared error A opt — 2/9. 
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III. ESTIMATION WITH THE UNBIASEDNESS 
CONDITION 

In Ref. D'Ariano, Giovannetti, and Perinotti con- 
sidered the estimation for the expectation of observables 
under the unbiasedness condition for any generally mixed 
state p® N and showed that the optimal estimate under 
the constraint is given by the sample average obtained by 
the independent successive measuremcnt of the observ- 
able on each copy. In this section we briefly discuss the 
same problem in the pure state case and show the same 
conclusion holds. 

The unbiasedness condition is written as 



J2"atr[EaP® N ] = tr[fip] 



(26) 



Note that tr[fip] on the right-hand side can be expressed 



as 



tr[fip] = tr^ av p® 
1 N 



(27) 
(28) 



If the unbiasedness condition l|26ll is assumed for any gen- 
erally mixed state p, then it can be shown 0] that 



fi a 



(29) 



for any permutation-invariant POVM {E a }. If we re- 
quire the unbiasedness condition for any pure state p, we 
can still show that the relation (|29|) holds in the totally 
symmetric subspace of T~i® N , namely, 



Sn ^E u o,Ea ~ fiav^ $N — 0. 



(30) 



This follows from a lemma for an operator A on TL® N : 

tr \Ap® 1 = for any pure state p, 
if and only if SnASn = 0. 

The "if ' part is trivial and we sketch the proof of the 
"only if part. We write \4>) — Yli=i c i \ * ) m terms of a 
basis { | i ) } of 7í, where p — \<f>){(/)\. Then we have 



tx[Ap 



C Í1 C Í2 ' ' ' C ÍN C h C Í2 ' ' ' C ÏN 

Íl—ÍN,3l—3N 

X(Í!Í2 ■ --ÍN \ A\jij 2 ■ ■ ■ JN ) 



1 L 2 



rii ,m, 

77 í 1 



0, 



(31; 



where the summation over integers rij > and ro, > 
should be taken under the conditions ^ í n, = m, = 



N , and the state | tpn 1 n 2 ---n d ) is the occupation-number 
representation of symmetric states (generally not normal- 
ized), with rii being the occupation number of state i. 
Equation (|3 II) should hold for any complex Ci, implying 

( 1pniTi2 • ■ -rid r ^ra\rai ■■■rrid ) 0- 

The difference between the two unbiased conditions 
(|29H and l|3U[l is the projection operator Sn in the pure- 
state case. This, however, does not hamper the subse- 
quent argument since the support of the operator p® N 
for pure p is the totally symmetric subspace. 

We go back to the expanded form of A(p) as in Eq. JSJ, 
but before being averaged over p. By using the unbiased 
condition l|30[) we readily find ^(p) — —2As(p) so that 
we have 



A(p) = 5>M^P 0A ] -(tr[pfi] 

a 

It can be shown that 



J2">[E a p® N ] >tr[fi^ 



(32) 



(33) 



since in the symmetric subspace we have 

< £ ~ ^ av ) Ea { Ua ~ ^ av ) 

a 



2 2 

'av 



(34) 



It is evident that the equality holds if the POVM ele- 
ment E a is the projector of the eigenstate of fi av and 
the estimate uj a is the corresponding eigenvalue, which is 
the sample average of the observed vàlues of fi for each 
copy. Thus the minimum value of the squared error in 
the unbiased estimation is given by 



tr 
1 

N 



fi 



avP 



tr[pfi 2 ] -(tr[pfi]) 2 ) = A av (p), (35) 



which shows that the conclusion of Ref. 0] holds if we 
restrict ourselves to the pure-state input ensemble. Av- 
eraging over p gives A av given in Eq. 1251) . 



IV. DISCUSSION AND CONCLUDING 
REMARKS 

We have seen that the optimal estimation of the expec- 
tation value of a physical observable is biased, though the 
optimal measurement is given by the successive projec- 
tive measurement of the observable. The optimal esti- 
mate 0J O pt is not given by the arithmetic average of ob- 
served data. 

We can intèrpret the expression (|21() of the optimal es- 
timate Lü pt in the following way. First of all, we should 
remember that we have full knowledge on properties of 
the observable fi including its eigenvalues. Otherwise we 
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cannot perform a measurement associated with íï. Then 
what can we expect for outcomes of the íï measurement 
before performing the measurement? The state p is given 
to us according to the unitary invariant distribution on 
the pure-state space, implying that we expect that each 
eigenvalue íïj occurs with equal probabilities as the out- 
come of the íï measurement. This a priori knowledge 
should be somehow taken into account in the estimation. 
We can see that this a priori knowledge is incorporated 
into the optimal estimate co pt in & natural way. It is 
just the arithmetic average of N observed data points 
{ííi„}^Li an d the d "default nonobserved" data points 
{^i}i=i> the latter of which add up to the trace of the 
observable. 

One may still wonder why the weights of the average 
for the observed and non observed data are equal. Actu- 
ally this is a feature of the pure-state ensemble consid- 
ered in this paper. To see this, let us take the simplest 
example of d = 2 and N = 1, but this time the state p 
is generally mixed. We assume that the Bloch vector n 
is distributed isotropically inside the Bloch sphere. The 
ensemble is characterized by the average (n 2 ), which is 1 
for the pure-state ensemble, but generally less than 1. 

After some calculation, the mean squared error turns 
out to be 



(A) 



tr 



E„ 



íï 



12 



(n 2 ) 2 



1 - 



(2tr[íï 2 ] - (triï) 2 ) , (36) 



where 



n = l ( l ( n2) tr n + (n 2 )íi 



(37) 



This implics that the optimal measurement is the projec- 
tive measurement of íï, and the optimal estimate is given 
by 



^opt 



-triï + (n 2 )n h 



(38) 



where íï^ is the observed eigenvalue of íï. The mini- 
mal mean squared error is given by the second term of 
Eq. (|36[) . We can see that the weight for the observed 
data decreases as the degree of mixing of the ensemble 
increases. When (n 2 ) = 0, this uj op t implies we should 
disregard the observed data. The reason is that we know 
that the expectation value is given by tríï/2 for a com- 
pletely mixed state. 

The generalization of our analysis to an ensemble of 
mixed states, including the details of the above discus- 
sion, will be presented elsewhere. 
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